18 research outputs found

    STRUCTURAL MODELING OF PROTEIN-PROTEIN INTERACTIONS USING MULTIPLE-CHAIN THREADING AND FRAGMENT ASSEMBLY

    Get PDF
    Since its birth, the study of protein structures has made progress with leaps and bounds. However, owing to the expenses and difficulties involved, the number of protein structures has not been able to catch up with the number of protein sequences and in fact has steadily lost ground. This necessitated the development of high-throughput but accurate computational algorithms capable of predicting the three dimensional structure of proteins from its amino acid sequence. While progress has been made in the realm of protein tertiary structure prediction, the advancement in protein quaternary structure prediction has been limited by the fact that the degree of freedom for protein complexes is even larger and even fewer number of protein complex structures are present in the PDB library. In fact, protein complex structure prediction till date has largely remained a docking problem where automated algorithms aim to predict the protein complex structure starting from the unbound crystal structure of its component subunits and thus has remained largely limited in terms of scope. Secondly, since docking essentially treats the unbound subunits as "rigid-bodies" it has limited accuracy when conformational change accompanies protein-protein interaction. In one of the first of its kind effort, this study aims for the development of protein complex structure algorithms which require only the amino acid sequence of the interacting subunits as input. The study aimed to adapt the best features of protein tertiary structure prediction including template detection and ab initio loop modeling and extend it for protein-protein complexes thus requiring simultaneous modeling of the three dimensional structure of the component subunits as well as ensuring the correct orientation of the chains at the protein-protein interface. Essentially, the algorithms are dependent on knowledge-based statistical potentials for both fold recognition and structure modeling. First, as a way to compare known structure of protein-protein complexes, a complex structure alignment program MM-align was developed. MM-align joins the chains of the complex structures to be aligned to form artificial monomers in every possible order. It then aligns them using a heuristic dynamic programming based approach using TM-score as the objective function. However, the traditional NW dynamic programming was redesigned to prevent the cross alignment of chains during the structure alignment process. Driven by the knowledge obtained from MM-align that protein complex structures share evolutionary relationships and the current protein complex structure library already contains homologous/structurally analogous protein quaternary structure families, a dimeric threading approach, COTH was designed. The new threading-recombination approach boosts the protein complex structure library by combining tertiary structure templates with complex alignments. The query sequences are first aligned to complex templates using the modified dynamic programming algorithm, guided by a number of predicted structural features including ab initio binding-site predictions. Finally, a template-based complex structure prediction approach, TACOS, was designed to build full-length protein complex structures starting from the initial templates identified by COTH. TACOS, fragments the templates aligned regions of templates and reassembles them while building the structure of the threading unaligned region ab inito using a replica-exchange monte-carlo simulation procedure. Simultaneously, TACOS also searches for the best orientation match of the component structures driven by a number of knowledge-based potential terms. Overall, TACOS presents the one of the first approach capable of predicting full length protein complex structures from sequence alone and introduces a new paradigm in the field of protein complex structure modeling

    MM-align: a quick algorithm for aligning multiple-chain protein complex structures using iterative dynamic programming

    Get PDF
    Structural comparison of multiple-chain protein complexes is essential in many studies of protein–protein interactions. We develop a new algorithm, MM-align, for sequence-independent alignment of protein complex structures. The algorithm is built on a heuristic iteration of a modified Needleman–Wunsch dynamic programming (DP) algorithm, with the alignment score specified by the inter-complex residue distances. The multiple chains in each complex are first joined, in every possible order, and then simultaneously aligned with cross-chain alignments prevented. The alignments of interface residues are enhanced by an interface-specific weighting factor. MM-align is tested on a large-scale benchmark set of 205 × 3897 non-homologous multiple-chain complex pairs. Compared with a naïve extension of the monomer alignment program of TM-align, the alignment accuracy of MM-align is significantly higher as judged by the average TM-score of the physically-aligned residues. MM-align is about two times faster than TM-align because of omitting the cross-alignment zone of the DP matrix. It also shows that the enhanced alignment of the interfaces helps in identifying biologically relevant protein complex pairs.Alfred P. Sloan Foundation; NSF Career Award (DBI 0746198); and the National Institute of General Medical Sciences (R01GM083107, R01GM084222). Funding for open access charge: Alfred P. Sloan Research Fellowship

    Streaming Adaptation of Deep Forecasting Models using Adaptive Recurrent Units

    Full text link
    We present ARU, an Adaptive Recurrent Unit for streaming adaptation of deep globally trained time-series forecasting models. The ARU combines the advantages of learning complex data transformations across multiple time series from deep global models, with per-series localization offered by closed-form linear models. Unlike existing methods of adaptation that are either memory-intensive or non-responsive after training, ARUs require only fixed sized state and adapt to streaming data via an easy RNN-like update operation. The core principle driving ARU is simple --- maintain sufficient statistics of conditional Gaussian distributions and use them to compute local parameters in closed form. Our contribution is in embedding such local linear models in globally trained deep models while allowing end-to-end training on the one hand, and easy RNN-like updates on the other. Across several datasets we show that ARU is more effective than recently proposed local adaptation methods that tax the global network to compute local parameters.Comment: 9 pages, 4 figure

    Revisiting the Plasmodium falciparum RIFIN family: from comparative genomics to 3D-model prediction

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Subtelomeric <it>RIFIN </it>genes constitute the most abundant multigene family in <it>Plasmodium falciparum</it>. <it>RIFIN </it>products are targets for the human immune response and contribute to the antigenic variability of the parasite. They are transmembrane proteins grouped into two sub-families (RIF_A and RIF_B). Although recent data show that RIF_A and RIF_B have different sub-cellular localisations and possibly different functions, the same structural organisation has been proposed for members of the two sub-families. Despite recent advances, our knowledge of the regulation of <it>RIFIN </it>gene expression is still poor and the biological role of the protein products remain obscure.</p> <p>Results</p> <p>Comparative studies on <it>RIFINs </it>in three clones of <it>P. falciparum </it>(3D7, HB3 and Dd2) by Multidimensional scaling (MDS) showed that gene sequences evolve differently in the 5'upstream, coding, and 3'downstream regions, and suggested a possible role of highly conserved 3' downstream sequences. Despite the expected polymorphism, we found that the overall structure of <it>RIFIN </it>repertoires is conserved among clones suggesting a balance between genetic drift and homogenisation mechanisms which guarantees emergence of novel variants but preserves the functionality of genes. Protein sequences from a <it>bona fide </it>set of 3D7 RIFINs were submitted to predictors of secondary structure elements. In contrast with the previously proposed structural organisation, no signal peptide and only one transmembrane helix were predicted for the majority of RIF_As. Finally, we developed a strategy to obtain a reliable 3D-model for RIF_As. We generated 265 possible structures from 53 non-redundant sequences, from which clustering and quality assessments selected two models as the most representative for putative RIFIN protein structures.</p> <p>Conclusion</p> <p>First, comparative analyses of <it>RIFIN </it>repertoires in different clones of <it>P. falciparum </it>provide insights on evolutionary mechanisms shaping the multigene family. Secondly, we found that members of the two sub-families RIF_As and RIF_Bs have different structural organization in accordance with recent experimental results. Finally, representative models for RIF_As have an "Armadillo-like" fold which is known to promote protein-protein interactions in diverse contexts.</p

    Following Natures Lead: On the Construction of Membrane-Inserted Toxins in Lipid Bilayer Nanodiscs

    Get PDF
    Bacterial toxin or viral entry into the cell often requires cell surface binding and endocytosis. The endosomal acidification induces a limited unfolding/refolding and membrane insertion reaction of the soluble toxins or viral proteins into their translocation competent or membrane inserted states. At the molecular level, the specific orientation and immobilization of the pre-transitioned toxin on the cell surface is often an important prerequisite prior to cell entry. We propose that structures of some toxin membrane insertion complexes may be observed through procedures where one rationally immobilizes the soluble toxin so that potential unfolding ↔ refolding transitions that occur prior to membrane insertion orientate away from the immobilization surface in the presence of lipid micelle pre-nanodisc structures. As a specific example, the immobilized prepore form of the anthrax toxin pore translocon or protective antigen can be transitioned, inserted into a model lipid membrane (nanodiscs), and released from the immobilized support in its membrane solubilized form. This particular strategy, although unconventional, is a useful procedure for generating pure membrane-inserted toxins in nanodiscs for electron microscopy structural analysis. In addition, generating a similar immobilized platform on label-free biosensor surfaces allows one to observe the kinetics of these acid-induced membrane insertion transitions. These platforms can facilitate the rational design of inhibitors that specifically target the toxin membrane insertion transitions that occur during endosomal acidification. This approach may lead to a new class of direct anti-toxin inhibitors

    QAUST: Protein Function Prediction Using Structure Similarity, Protein Interaction, and Functional Motifs

    Get PDF
    The number of available protein sequences in public databases is increasing exponentially. However, a significant percentage of these sequences lack functional annotation, which is essential for the understanding of how biological systems operate. Here, we propose a novel method, Quantitative Annotation of Unknown STructure (QAUST), to infer protein functions, specifically Gene Ontology (GO) terms and Enzyme Commission (EC) numbers. QAUST uses three sources of information: structure information encoded by global and local structure similarity search, biological network information inferred by protein–protein interaction data, and sequence information extracted from functionally discriminative sequence motifs. These three pieces of information are combined by consensus averaging to make the final prediction. Our approach has been tested on 500 protein targets from the Critical Assessment of Functional Annotation (CAFA) benchmark set. The results show that our method provides accurate functional annotation and outperforms other prediction methods based on sequence similarity search or threading. We further demonstrate that a previously unknown function of human tripartite motif-containing 22 (TRIM22) protein predicted by QAUST can be experimentally validated

    How Many Protein-Protein Interactions Types Exist in Nature?

    Get PDF
    “Protein quaternary structure universe” refers to the ensemble of all protein-protein complexes across all organisms in nature. The number of quaternary folds thus corresponds to the number of ways proteins physically interact with other proteins. This study focuses on answering two basic questions: Whether the number of protein-protein interactions is limited and, if yes, how many different quaternary folds exist in nature. By all-to-all sequence and structure comparisons, we grouped the protein complexes in the protein data bank (PDB) into 3,629 families and 1,761 folds. A statistical model was introduced to obtain the quantitative relation between the numbers of quaternary families and quaternary folds in nature. The total number of possible protein-protein interactions was estimated around 4,000, which indicates that the current protein repository contains only 42% of quaternary folds in nature and a full coverage needs approximately a quarter century of experimental effort. The results have important implications to the protein complex structural modeling and the structure genomics of protein-protein interactions

    The number of new complex structure entries deposited per year in the PDB.

    No full text
    <p>Data are presented in terms of unique structures (sequence identity <90%), families (mapped with unique Pfam families), and folds (rTM-score <0.5).</p

    Number of estimated complex folds for a range of numbers of complex families.

    No full text
    <p>Number of estimated complex folds for a range of numbers of complex families.</p
    corecore